CSS v1.2.0: Phase A — parametric coverage additions#6938
Conversation
Asserts that span.http.method and span.http.route metadata are populated as HTTPMethod and HTTPEndpoint in the /v0.6/stats payload, per CSS v1.2.0 spec §5. Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts Hostname, Env, Version, Service, RuntimeID, and Sequence are populated in the /v0.6/stats payload per CSS v1.2.0 spec §3 (deployment-level identifiers). Mark nodejs, php, ruby, cpp as missing_feature since stats computation isn't implemented.
Asserts that ContainerID, Tags, ImageTag, AgentAggregation, and ProcessTagsHash are absent or empty in the tracer-sent /v0.6/stats payload, per CSS v1.2.0 spec §3 (these fields are agent-populated). Mark nodejs, php, ruby, cpp as missing_feature.
Asserts spans with the _dd.partial_version metric set are excluded from stats aggregation, per CSS v1.2.0 spec §7 (Span Exclusions). A control span without the metric must still produce stats. Mark nodejs, php, ruby, cpp as missing_feature.
|
|
Test agent returns the /v0.6/stats request body as a base64-encoded str, not bytes. Mypy correctly flagged the annotation mismatch on TS011.
Split type-and-truthiness assertions into separate checks per ruff's pytest rule against compound assert statements.
🎉 All green!🧪 All tests passed 🔗 Commit SHA: 306dd8d | Docs | Datadog PR Page | Give us feedback! |
…ey fail CI revealed real per-SDK gaps: - python: parametric harness does not flush /v0.6/stats (TS011-TS014 all) - golang: HTTPEndpoint not populated from http.route; Hostname not set on payload (TS011, TS012) - dotnet: _dd.partial_version spans not excluded from stats (TS014) These reflect real implementation gaps in those SDKs (or the parametric harness), not test bugs — markers explain the gap per spec section.
Resolve manifest/dotnet.yml conflict on TS001 version (main bumped to <3.43.0). Replace ': ' with ' - ' in CSS v1.2.0 missing_feature reasons to avoid YAML parsing errors (colon was being interpreted as a mapping value).
Java fails TS011 (HTTPMethod=None), TS012 (Hostname=''), and TS014 (partial.snapshot not excluded) on both dev and prod. TS013 passes. Rust fails all 4 (parametric harness does not flush /v0.6/stats, same root cause as python).
TS011: was setting only http.route. dd-trace-go (and the spec field name) reads http.endpoint. Now sets both http.endpoint and http.route. Also adds DD_TRACE_RESOURCE_RENAMING_ENABLED=true so dd-trace-java's gate on HTTPMethod/HTTPEndpoint extraction (Config.java:2278) is on. TS012: tracers do not auto-detect hostname in the parametric harness; pin DD_HOSTNAME=test-host so the field is populated as the spec requires. Removed missing_feature markers from golang (TS011, TS012) and java (TS011, TS012) — those were test bugs, not implementation gaps. Java's TS014 marker remains: dd-trace-java's exclusion uses the internal longRunningVersion field, not the _dd.partial_version metric, which the spec mandates.
…ERVICE env Root cause of python failures: dd-trace-py >= 3.x delegates CSS to libdatadog's native TraceExporter. The exporter only flushes /v0.6/stats on its 10-second internal timer or on shutdown — the parametric server's /trace/stats/flush endpoint was still using the long-removed Python-side SpanStatsProcessorV06 and silently no-op'd. - Update the parametric server to fall back to writer.on_shutdown() + writer.recreate() when the legacy processor is absent. This deterministically flushes libdatadog stats at the end of each parametric test. - TS012 also needed DD_SERVICE (payload-level Service is the configured main service name per spec §3, not the per-span service). Added to library_env alongside DD_HOSTNAME. With these fixes, all four python CSS tests pass locally. Removing python missing_feature markers for TS011-TS014.
dd-trace-go option.go:297 and dd-trace-java Config.java:2005 only populate the Hostname field on the ClientStatsPayload when DD_TRACE_REPORT_HOSTNAME is on. Without it, both SDKs return empty Hostname even when DD_HOSTNAME is set.
dd-trace-go stats.go:181 only writes Service at the per-bucket StatSpanConfig level (the span's service), not at the payload level. The spec mandates Service in ClientStatsPayload. This is the same divergence already documented for Go on test_top_level_service in the e2e suite.
Two fixes informed by checking the trace-agent's actual use of these fields: 1. Service: the trace-agent uses payload-level ClientStatsPayload.Service only as a partition-key hint in PayloadAggregationKey.BaseService (client_stats_aggregator.go:178). The per-bucket ClientGroupedStats.Service is the spec-required source of truth that ends up at the backend. dd-trace-go intentionally writes Service only at the per-bucket level (stats.go:181). Accept either location. 2. Env: dd-trace-java's WellKnownTags does not apply the spec's 'unknown-env' default when DD_ENV is unset. Pin DD_ENV (plus DD_VERSION for completeness) so the assertion is deterministic across SDKs. Removed the Go TS012 marker — the test now passes for Go under the lenient Service assertion.
dd-trace-go stats.go:96-103 builds the PayloadAggregationKey without a RuntimeID field, so the /v0.6/stats payload sent to the agent has RuntimeID empty. The trace-agent passes RuntimeID through to the backend (writer/stats.go:408) for message-uniqueness/deduplication but doesn't aggregate by it. Functionally non-fatal, but it's a spec-mandated field.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbdda2e69a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@ichinaski I see your AI wrote I see there are multiple stats computation testsuites, PHP implements |
Sorry for the noise, I am still assessing what the Client Libraries MCP is reporting about the status of v1.2.0 implementation. I will put together all the resources and share them for review via Slack. I will certainly need some help understanding each SDKs status. |
The PHP system-tests parametric markers said 'php has not implemented stats computation yet', but dd-trace-php shipped CSS in v1.19.0 via a hybrid C+Rust+sidecar architecture. The blocker for the Phase A parametric tests is that the PHP parametric app's /trace/stats/flush handler is a no-op (utils/build/docker/php/parametric/server.php:262), not a missing SDK feature. Update the markers to reflect this.
Both SDKs ship CSS but their parametric servers had no-op flush handlers, so TS011-TS014 had no way to drive a stats send before assertions. - nodejs: drive tracer._tracer._processor._stats.onInterval() and capture the span-stats writer's _sendPayload callback so the response waits for the HTTP PUT to /v0.6/stats to complete. - php: call dd_trace_synchronous_flush(1000); per ext/ddtrace.c:3286-3310 that invokes ddog_sidecar_flush with traces_and_stats=true, which drains both span and stats buckets through the sidecar transport in one call.
After the parametric /trace/stats/flush fixes for nodejs and php, the Tracestats markers attributed to 'harness lacks stats flush' are stale. - nodejs: drop 11 'nodejs has not implemented stats computation yet' markers. Keep TS014 with corrected reason citing the real SDK gap (span_stats.js onSpanFinished only filters TOP_LEVEL_KEY/MEASURED, not _dd.partial_version). Keep TS008 (broken everywhere). - php: collapse 11 'PHP parametric app lacks stats-flush API' markers into one class-level Test_Library_Tracestats: v1.19.0 gate, matching the existing tests/stats/test_stats.py: v1.19.0 pattern. Keep TS008. - rust: correct the misleading 'rust parametric harness does not flush /v0.6/stats' reason on TS011-TS014. dd-trace-rs only has the DD_TRACE_STATS_COMPUTATION_ENABLED config flag and no aggregation or exporter, so the harness no-op is downstream of an SDK gap.
PHP CSS stats need bucket aging >20s (libdatadog buffer_len*bucket_size) or DD_TRACE_FORCE_FLUSH_ON_SHUTDOWN+process exit. dd_trace_synchronous_flush passes force=false to the sidecar, so the parametric pattern cannot observe /v0.6/stats. Restored PHP /trace/stats/flush to NOP and updated the manifest reason to reflect the architectural limitation instead of the v1.19.0 marker. Node.js: added per-test missing_feature markers for TS001/TS003/TS005/TS006/TS007 describing the dd-trace-js SDK gaps surfaced by the now-running tests.
cbeauchesne
left a comment
There was a problem hiding this comment.
From framework usage, all good. Good you get a review from someone familair with the tested feature ?
First slice of system-tests coverage for the gaps identified in the CSS v1.2.0 status report.
What's added
Four parametric tests in
tests/parametric/test_library_tracestats.py:test_http_method_endpoint_TS011HTTPMethod/HTTPEndpointpopulated fromhttp.method/http.endpoint/http.routetest_payload_metadata_TS012Hostname,Env,Version,RuntimeID,Sequencepopulated;Servicepresent (payload-level or per-bucket)test_agent_populated_fields_empty_TS013ContainerID,Tags,ImageTag,AgentAggregation,ProcessTagsHashabsent (agent-populated)test_partial_version_excluded_TS014_dd.partial_versionset don't contribute to statsA
_find_raw_v06_statshelper reads the raw msgpack payload, since the decodedV06StatsAggrview is narrower than the spec.Parametric harness fix (python)
utils/build/docker/python/parametric/apm_test_client/server.py—/trace/stats/flushwas relying onSpanStatsProcessorV06, which dd-trace-py removed when CSS moved to libdatadog. The endpoint now falls back towriter.on_shutdown()+writer.recreate()when the legacy processor is absent. Old behavior preserved otherwise.Per-SDK summary (review focus)
missing_feature: payload-levelRuntimeIDnot set (stats.go:96-103). Other three pass.missing_feature: exclusion uses internallongRunningVersion(ConflatingMetricsAggregator.java:308), not the spec's_dd.partial_version.missing_feature:_dd.partial_versionspans not excluded.manifests/nodejs.ymlfor details on each. dd-trace-js CSS is implemented but diverges from spec on Resource field, aggregation keys, error/success separation, P0 dropping, and partial_version filtering.missing_feature (>=1.19.0): dd-trace-php ships CSS but libdatadog's sidecar holds buckets <20s (buffer_len=2 × bucket_size=10s) anddd_trace_synchronous_flushpassesforce=false, so parametric tests never observe/v0.6/stats. No userland force-flush API exists.missing_feature: CSS not implemented in tracer.Spec gaps surfaced (follow-ups, not addressed here)
dd-trace-go:
RuntimeIDnever set (ddtrace/tracer/stats.go:96-103)Servicenever set; only per-bucket (stats.go:181)HTTPEndpointreadshttp.endpoint, inconsistent with OTel'shttp.route/infoversionfield not parsed ininfoResponsestats.go:266) contradicts specapi.errorsmetric not emitted from stats endpoint error pathdd-trace-js (Node.js): see per-test manifest reasons (Resource = span name, single bucket for errors/success, no P0 drop on sample_rate=0, no
_dd.partial_versionfilter).dd-trace-php: no userland force-flush API; bucket aging blocks short-lived tests.
dd-trace-java: TS014 uses internal exclusion field instead of spec's
_dd.partial_version.Phases B–E (follow-ups on this branch)
filter_tags,filter_tags_regex,ignore_resources)Full plan in
css-spec-coverage-plan.md(out-of-repo).